Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis

نویسندگان

Keikichi Hirose

Keiko Ochi

Miaomiao Wang

Tatsuya Matsuda

Miaomiao Wen

Nobuaki Minematsu

چکیده

Generation process model of fundamental frequency contours known as Fujisaki's model is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information included in the utterance. Therefore, by controlling fundamental frequency contours in the framework of the generation process model, a more flexible control of prosodic features comes possible in speech synthesis. Also, the model can be used to solve the problems of HMM-based speech synthesis, which arise from frame-by-frame treatment of fundamental frequencies. In this paper, two methods for improved control of prosodic features in HMM-based speech synthesis, and one method for flexible fundamental frequency control to realize prosodic focuses in synthetic speech, are presented. All these methods are based on the generation process model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model

Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process mo...

متن کامل

Separation of Voiced Source Charac Transfer Function Characteristics Fo Analysis Based on Ar-h

A new method was developed for the separation of source and transfer function characteristics of speech sounds, with an aim of utilizing it to “flexible” speech synthesis. The method is based on representing source waveform by an HMM, and transfer function by the AR process (AR-HMM model). As compared to methods based on ARX model, where a parametric representation is assumed for source wavefor...

متن کامل

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated i...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Improved generation of prosodic features in HMM-based Mandarin speech synthesis

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis

نویسندگان

چکیده

منابع مشابه

Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model

Separation of Voiced Source Charac Transfer Function Characteristics Fo Analysis Based on Ar-h

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Speech enhancement based on hidden Markov model using sparse code shrinkage

Improved generation of prosodic features in HMM-based Mandarin speech synthesis

عنوان ژورنال:

اشتراک گذاری